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Contract No. NAS9-16785 

Area Estimation Using Multiyear Designs and Partial Crop Identification 

Final Report 

This final report refers to project number 4821 entitled "Area Estimation 
Using Multiyear Designs and Partial Crop Identification". This project spanned 
the period from November 1, 1983, to March 31, 1984. 

1. INTRODUCTION 

Agriculture and other renewable resources can be economically inventoried 
over large areas using aerospace remote sensing techniques. In particular, the 
surface ar^ devoted to a specific resource in a large region is especially 
amenable tc aerospace estimation. Such resources could be as broadly defined 
as agriculture, forest, water, snow cover, etc. or as specifically defined as 
summer crops or corn. These area estimates can be combined with other measures 
such as estimated yield per acre to obtain production estimates. Once the 
appropriate estimation methodology has been successfully implemented, the 
successive estimates are very economical, so that frequent inventories are 
realistically obtainable. 

During 1975-1977 NASA in conjunction with the USDA conducted the Large 
Area Crop Inventory Experiment (LACIE) to illustrate the potential capabilities 
of aerospace remote sensing techniques. This pioneering effort also served to 
remove many of the obstacles for future applications. A summary of the experi- 
ment is given in the proceedings of the LACIE Symposium (1978). The target 
resource in LACIE was the wheat acreage and production in the U.’ S. Great 


During the transition years 1977-1979 and during 1979-1983 under the 
recently-terminated AgRISTARS (Agriculture ar.d Resources Inventory Surveys 
through Aerospace Remote Sensing) program several advances were made in 
satellite imagery technology, data processing, and statistical methodologies. 
In addition, target resources were expanded to include other crops and other 
countries, as well as non-crop resources. 

The research under Contract No. NAS9-16785 has focused on the statistical 
methodology for estimating a particular resource's acreage proportion in a 
large region at a specified point in time using the estimated resource 
acreage proportion in a sample of smaller areas. In describing this research 
it will be assumed that 

(i) the resource is a crop, 

(ii) the specified time point of interest is the harvest tir.ie for the 
crop, 

(iii) the sample areas are all the same size (a 5x6 nautical mile 
rectangle called a segment), and 

(iv) the sample segments are relatively "small" compared to the 
homogeneous region (stratum) of interest. 

Also, it is assumed that in each year of a multiyear period a sample of 
segments is selected. The composition of the sample may vary year to year. 

In each year each sample segment's at-harvest crop acreage proportion is 
estimated at one or more times during the crop growing season. The number 
of estimates is not necessarily the same for all sample segments in a year 
and is n^t necessarily the same for each year. Obviously, the contract 
research has focused on only one part of a much larger problem. The region 
of concern herein is really just one stratum in a stratified sample survey 
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of a country or the world (see, for example, Chhikara and Feiveson (1982)). 

The size of the sample segment is assumed to be predetermined (see Chhikara 
and Feiveson (1982) and Chhikara et al, (1984)). Also, since the same 
segments do not have to be in the sample every year, there is an interesting 
associated problem of determining an optimal multi-year sampling design 
(see Chhikara et al. (1984), Gbur and Sielken (1980a), Gbur and Sielken (1980b), 
Gbur and Sielken (1981) and the discussion in Section 4). The papers by 
Heydorn (1984) and Hall and Houston (1984), for example, discuss the 
determination of the sample segment's estimated at-harvest crop acreage 
proportion. Finally, the estimates arising from the statistical methodology 
developed under this research contract and the preceding contract (No. NAS9- 
13894) can be input to procedures for aggregating acreage over several 
regions and combining acreage estimates with yield estimates to obtain 
production estimates. The paper by Feiveson (1984) is a good example of the 
research addressing these latter needs. 

2. OVERVIEW OF RESEARCH ACTIVITY 

The two major tasks under Contract No. NAS9-16785 were 

1) the development and refinement of sampling and modeling techniques, and 

2) the development and refinement of aggregation techniques. 

The principle research activities associated with the development and 
refinement of the sampling and modeling techiques were 

1) the extension of multiyear models and estimation procedures to include 
partial ground cover identification, and 

2) the development of a procedure to determine the optimal current year 
sampling design as a function of previous years' results. 
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The major activities concerning the development and refinement of aggregation 
techniques were 

1) the identification of statistical methodology for utilizing different 
weighting factors which could be assigned to the observations, and 

2) the derivation of approximate variances for ground cover estimators 
which incorporate partially identified sampled units. 

These four major activities are discussed in the next four sections respectively 
(Sections 3-6 ) . Section 7 indicates some additional research results. Section 
8 concludes this final report and makes a suggestion for future research. 

3. EFFICIENT ACREAGE ESTIMATION USING MULTIYEAR DATA 
WITH BOTH PARTIALLY AND COMPLETELY IDENTIFIED SAM P LING UNITS 

Each stratum at-harvest crop acreage proportion could be modeled using a 
regression approach with explanatory variables such as the past, present, and 
anticipated economic and meteorological conditions. However, the unknown form 
of the regression model, the larg’' number of possible explanatory variables, 
and the difficulty in obtaining reasonable values for these variables makes 
this approach unattractive. Nevertheless, the combined effect of all of these 
variables is reflected in the crop acreage proportions for the stratum 
segments. Although it is not economical to estimate the at-harvest crop 
acreage proportion for every segment in the stratum, it is feasible to estimate 
them for a sample of segments using Landsat data (see, for example Hall and 
Houston (1984) and Heydorn (1984)). Hence, an alternative approach is to 
model the estimated at-harvest crop acreage proportion for a sample in terms 
of 

(i) the stratum at-harvest crop acreage proportion, 

(ii) stratum-wide influences which vary from year to year. 



(iii) characteristics of the segment itself, 

(iv) yearly influences which affect different segments differently, and 

(v) the proportion of the growing season which has passed at the time 
the estimate is determined. 

These factors may only contribute roughly additively to a transformation of 
the segment at-harvest crop acreage proportion and may not contribute addi- 
tively to the segment proportion itself. 

One specific model which is compatible with these ideas is 

= °t + b s + 6 l + e tsi T> 

s = 1 , ...» S, (l) 

£ = 1, ... , L 

where 

p ts £ - the estimated proportion of the s-th segment's acreage that will 
contain the crop at harvest time in the t-th year when the 
estimate is made at crop calendar time £ (for example, £ = 1 
could denote early season, 1-1 mid-season, and £ = 3 harvest 
time); 

* A 
y(p ts £) * a transformation of P st ^i 

a t = the stratum's transformed crop acreage proportion for the t-th 
year; 

b g = the s-th sampled segment's departure from the stratum's transformed 
crop acreage proportion; the b g * s are independent random variables 
each with mean zero and variance a£; 

6^ = the systematic difference between the estimates of the crop's 
transformed at-harvest acreage proportion made at the £-th crop 
calendar time and the corresponding estimate made at harvest time; 
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e t$£ * a 99>" e 9 ate sampling and classification errors in the 

transformed data; the e^'s are independent random variables 
each with mean zero. 

This model is, of course, not the most general model possible. In particular, 
the segment effects b g are assumed to be independent of the crop calendar 
time and the year. Also the departures of the transformed observations y(p ts ^,) 
on the same segment from their fixed year effects a t and their fixed estimation 
time effects ^ are assumed to be positively correlated. The error terms e ts ^ 
are the composite effect of many components and need not have homogeneous 
variances; in particular see Heydorn (1984) for a detailed discussion of the 
classification error components. 

The primary objective is to estimate the crop's at-harvest proportion of 
the stratum acreage in the current year, T; that is, estimate Py s y” 1 ^). 
Secondary objectives could be improved estimates of at-harvest acreages in 
previous years or estimates of changes in the stratum at-harvest crop acreage 
proportion from year to year. 

Estimates of the stratum at-harvest crop acreage proportion are also 
often desired throughout the current year as well as at harvest time. For 
example, an early season estimate of P T based on observations for I s 1, ..., 

L for t * 1, ...» T-l and only l = 1 for t = T is frequently desired. 

Even though the estimate Py s y (ay) of the stratum at-harvest crop 
acreage proportion for the current year involves only ay, this es*‘mate depends 
on the entire mulityear data set and not just the data from year T since the 
segment effects (b s 's) and systematic estimation time biases (6^'s)are assumed 
to be constant from year to year. 

Special cases of model (1) have also been considered. For example, Chhikara 
et al_. (1984) consider at-harvest estimates made only at harvest time, so that 
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their model is 

P ts “ + b g + e ts . t s 1, . .., T and s « 1, .... S. 

For simplicity Feiveson (1984) considers only estimates of the stratum at-harvest 

crop acreage proportion made at harvest time during the current year; i.e., 

* 

Pfg = tty f » S ■ 1 , ...» S. 

When such data is not available, Feiveson (1984) utilizes historical data from 
agricultural reports even though previous Landsat data could also be incorporated. 
The methodology in both of these papers can be extended to incorporate the more 
general model (1). 

Multiyear estimation models provide the ability to make estimates of the 
current year's acreage on the basis of not only the current year's sampled data, 
but also the previous years' sampled data. In the past such multiyear models 
have been developed and used when the sampled data is the proportional acreage 
of a single crop of interest. In such cases the use of multiyear models can 
easily reduce the variation in the current year's estimate to one half of 
what it would have been if the previous years' sampled data were ignored. 

In Sielken (1981) techniques were developed and tested for estimating the 
acreage for a particular crop when there is only sampled data from a single year 
and some of the segments have been only partially identified. A segment is said 
to only be partially identified as opposed to completely identified if only the 
proportion of the segment containing some unknown percentage mixture of two or 
more ground covers (including the specified crop of interest) is estimated. 

In developing these estimations techniques consideration has been given to the 
following approaches: 


a) maximum likelihood estimation, 

b) least squares methods, 

c) weighted least squares methods, and 

d) a combination of a least squares ratio estimator of the specified 
crop's acreage percentage, say R, within the combined acreage of all 
crops in the mixture and a maximum likelihood estimator of the mixture 
acreage. 

The empirical behavior of approach (d) based on the combination of the 
least squares ratio estimator, denoted by R say, of the specified crop's 
acreage percentage within the combined acreage of all crops in the mixture 
and a maximum likelihood estimator of the mixture acreage has been usually 
as good as, if not better than, approaches (a) - (c). 

The following procedure is recommended for estimating a crop's current 
year acreage within a stratum based on both the current year's data and 
previous year's data wh^n these data involve both partially and completely 
identified sampling units: 

A 

i) Determine the least squares ratio estimator, R, for the crop of 
interest using the current year's partially and completely 
identified sampling units. 

ii) Transform each multivariate segment observation into a univariate 
observation by combining all of the acreages for the crops involved 
in the mixture of crops creating the partial identification. Call 
this combination of crop acreages the mixture acreage, 

iii) Apply the multiyear modeling and estimation procedures to the multiyear 
data set consisting of the observed segment mixture acreages. Let 
denote the corresponding estimated proportion of the stratum's 



current year acreage containing crops in the mixture, 
iv) Estimate the stratum* s current year acreage proportion for the crop 

A A 

of interest by the product P, * P^. 

Time series or regression models can be used to augrneit step (i) in the 
above procedure if trends over time in the specified crop's ratio R are anticipated 
or if covariates for R can be identified, 

4 . Optimal Current Year Sampling Designs 
Based on Previous Years' Data 

Here a sampling design is a plan which defines the way in which the sample 
of segments is to be chosen from a stratum's population of segments. An optimal 
design yields estimates which have optimal properties. In the past, sampling 
designs in support of ground cover proportion estimation have specified at the 
outset of the study how the sampling is to be done in each year of the study. 

As these designs are being implemented considerable information is gathered. 

For example, cloud cover may have eliminated particular observations and improved 
estimates of relevant variances may have become available. Such information is 
not incorporated in the original non-sequential design. However, a sequentially 
determined sampling design which allows information from previous years to 
influence the current year's design should produce sampling designs leading to 
better estimates. 

The use of a multiyear mixed model weighted analysis of variance to estimate 
a stratum's at-harvest crop acreage proportion based on estimated proportions 
from sampled segments has been described in Dahm and Sielken (1980). The 
selection of multiyear sampling designs as described in Gbur and Sielken (1980a, 
1980b, and 1981) was based on two simplifying assumptions. First, the design 
selection procedure did not take into account any previous sampling information 


on the stratum nor did it allow sampling information obtained during the early 
periods of the design to affect sampling in subsequent periods. Second, 'a an 
attempt to reduce the number of competing designs to a manageable level, it was 
assumed that the number of segments to be sampled in each future year was the 
same. 

Yearly changes in economic conditions, measurement techniques, equipment 
characteristics, and reliability requirements suggest th*st a more realistic 
approach would be to sequentially select each year's sampling pattern. Such a 
sequential approach would utilize the information collected from all previous 
years' sampling of the stratum. It would allow for the selection of a sampling 
pattern for each year which reflects the effects of missing observations in 
previous years' samples as well as changes in factors such as those mentioned 
above. 

In a new technical report (Gbur and Sielken (1983)) a computer program 
called OPTDESIGN is documented which enables the user to obtain a list of the 
best sampling patterns for a stratum for the current year based on the segment 
proportion information from all previous years. Two criteria for design 
selection have been implemented. These are the minimization of the variance 
of the current year's estimated stratum transformed proportion and the 
minimization of the variance of the estimated change from the previous year's 
stratum transformed proportion. 

Since the variances of the y(P tS £)'s in model (1) are not necessarily equal, 
a weighted form of the model (1) has been used. In matrix notation this model 
can be expressed as 


WY = WX 


a 



+ WUb + Ie , 


( 2 ) 


where 


Y * ^111' y 112’ **” y TSL^ * 

a * [ot 1 1 .... a T ] * » 

<5 s D$ ^ » •••» * (^ |_ ~ 

b = [b^ • . . . , bg] 1 , 

W = weight matrix = [w^] , 

X s design matrix for the fixed effects (a t *s and 6^‘s), 

U * design matrix for the random effect (b s ‘s), 

c B Cem* e 112* ’ ' * * e T$L^ * B We 

s vector of transformed errors. 

In the weighted model (2), the estimates of are obtained from the appropriate 
entries of (X^'V^WX)*^ X'W'V^Y and the covariance matrix of the vector a is 
the upper left block of the matrix 
J = (X'W'V^WXr 1 a z e , 

where 

Y = I + W'U'UWY , 

Y ■ /°l ■ 

The stratum at-harvest crop acreage proportion is estimated by P t = y~ 

In determining the optimal sampling design, it is assumed that information 

from years t=l,2 Tis available for the stratum under consideration. Since 

within season segment proportion estimates are often not available and are not 
particularly important in design selection, our procedure only utilizes the 
at-harvest segment proportion estimates made at harvest time. Therefore, the 
within season biases 6^ in model (2) are eliminated. 

The required information for OPTDESIGN for each stratum consists of 
i) segment identification numbers for each segment sampled in each previous 
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ii) the final estimated weight, w tsL , associated with each estimated 
transformed acreage proportion, y(p^ s ^),and 
iii) an estimate of the variance component ratio y , 

The estimated segment proportions p t$ ^ are not required for design selection, 
except insofar as they may be needed to calculate the estimated wieghts w ts( _. 

Since the covariance matrix for each design contains the same (unknown) multiplier 

a 2 , the particular value of c 2 does not need to be considered in the design 

£ ^ 

selection process. 

The optimality measures implemented in OPTDESIGN are 
* 

i) minimize var(aj + ^), the variance of the estimated stratum transformed 
at-harvest crop acreage proportion for the current year, 
ii) minimize var(c* T+ ^ - Sy), the variance of the estimated change in the 

stratum transformed at-harvest crop acreage proportion from the previous 
year. 

The minimizations in (i) and (ii) are determined over the set of all possible 

T+l year designs containing the specified number of segments to be sampled in 
t h 

the current T+l— year and for which the parent T year design is given by the 
sampling history of the stratum. 

OPTDESIGN is a self-contained computer program for determining the best 
designs according to the optimality criteria described above. It is written 
in Fortran and contains numerous comment cards which provide extensive internal 
documentation. A listing of OPTDESIGN and a flowchart of the program logic, 
as well as sample inputs and corresponding outputs for OPTDESIGN are given in 
Gbur and Sielken (1983). 

For each stratum the following information is printed in the output from 


OPTDESIGN: 


(1) Initial Information including 

(a) stratum number, 

(b) number of years of prior information, 

(c) number of segments sampled in each previous year, 

(d) number of segments to be sampled in the current year, 

(e) estimate of the variance component ratio -y , 

(f) weight to be assigned to all current year segments for the purpose 
of computing the optimality measures. 

(Z) For each previously sampled segment, 

(a) segment label, 

(b) year the segment was sampled, 

(c) weight attached to that observation. 

(3) A list of the NOPT best designs for the stratum for each optimization 
above, along with the value of the criterion for each design. 

The program OPTDESIGN has been written to allow for as much flexibility 
as possible in the sampling history of the stratum. The only unchangeable 
restriction is that at least one year of prior information is required. The 
program will accept any positive numbers as weights and arbitrary samples sizes 
for each previous year in which the stratum has been sampled. 

Since the weights assigned to each previously sampled segment are, from 
the program's viewpoint, arbitrary positive numbers, they can be used to 
reflect many different factors. The weights need not be computed solely as 
functions of the estimated segment proportions. They could be used to account 
for such factors as changes in measurement techniques, classification algorithms, 
and equipment characteristics as well as factors such as differences in the 
level of difficulty of classification for the AI, number and quality of the 
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set of "photographs" used to obtain the estimate, and differences in A1 
personnel . 

The current version of the program assumes that the weight matrix is 
diagonal. However, the data input format can be easily modified to allow for 
arbitrary nonnegative weight matrices. 

The sample sizes for previous years sampling are arbitrary positive 
integers. This allows for differences in sample sizes caused by factors such 
as missing observation in one or more years, budgetary changes, and the tar- 
geting of selected strata for more intensive sampling in certain years. 

It is conceivable that the sampling history of a stratum contains T* years 
in which no sampling occurred. Since OPTDESIGN requires the previous years to 
be labeled as 1, 2, ..., T, the years in which the stratum was sampled could 
be numbered consecutively as 1, 2, ..., T-T* and all years' information 
utilized. 

"Ground truth" data could be combined with the stratum's sampling history. 
The weights for such "ground truth" estimates should reflect any differences 
in their quality and variability as compared to the remotely sensed segment 
estimates. 

The multiyear model (2) on which the program OPTDESIGN is based on 
relatively simple. Additional fixed effects and covariates could be incorpor- 
ated to improve the estimates. Modification of OPTDESIGN to reflect the 
expanded model can be achieved in a straightforward manner by substitution of 
a new subroutine for computing the fixed effects design matrix X. The inclusion 
of additional random effects in the model would require more extensive modifica- 
tion of the program, but could also be accomplished. 
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5. INCORPORATING WEIGHTING FACTORS INTO THE 
STATISTICAL METH0DL06V FOR MULTIYEAR DATA 

Current multiyear estimation methodology uses observations as if their 
variability was only dependent upon the true underlying proportion being 
estimated. In practice, however, the variability of an observation is 
dependent upon many other factors; for example, the season in which the 
observation is made, the amount of previous satellite imagery available, the 
quality of that imagery, the satellite being used, the "closeness" of the 
spatial -spectoral -temporal patterns observed in the sampled units to their 
classical prototypes (say for corn, soybeans, pasture, forest, etc,). Better 
use of the observations can be made in aggregation if greater weight can be 
given to the more precise observations and lesser weight given to less precise 
observations. Hence, better aggregation estimates should be obtainable if 
the precision of the observations is more accurately assessed and then 
incorporated into the multiyear area estimation techniques. This is 
particularly important in the multiyear environment where satellite technology, 
analyst and computer methodologies, etc. are hopefully improving from year to 
year. 

The suggested approach is to characterize precision or confidence in 

the observations in terms of their variances and weight the observations 

* 

proportionately to the inverse of their variances. The Var (P^ s ^) can be 
approximated on the basis of information such as 

(i) the type of satellite being used, 

( i i ) the sharpness of the satellite imagery, 

(iii) the season during which the estimate is being made, 

(iv) the number of satellite images successfully obtained by the time 
the segment proportion is estimated. 


(v) the nearness of the segment's observed behavior to classical 
crop profiles, 

(vi) the weather conditions during the crop's growing season, and 

(vii) the physical characteristics of the segment. 

In addition, recognizable segment characteristics which make it either easier 

or harder to estimate the segment's crop proportion can be incorporated. 

* 

Obvious differences in the amount of information going into the p t s can 
also be reflected. These latter differences can be due to the estimation 
times themselves as well as due to loss of satellite imagery from cloud cover, 
machine failure, etc. 

The statistical procedures for area estimation documented in Dahm and 
Sielken (1981) can easily incorporate as input both the observation and its 
weight (confidence measure). The weighted form of the multiyear model 
(1) is the model (2) discussed in Section 4. Detailed procedures for 
implementing the statistical analyses associated with model (2) are given in 
Dahm and Sielken (1981). 

The advantages and disadvantages of doing weighted analyses of linear 
models as opposed to unweighted analyses when the observations have unequal 
reliability or variances is well documented in the statistical literature 
(see, for example, Draper and Smith (1981), Kleijnen (1981), and Scheffe (1959)) 

6. VARIANCES FOR GROUND COVER ESTIMATORS 
INCORPORATING PARTIALLY IDENTIFIED SAMPLED UNITS 

In the past, large scale ground cover area estimation techniques have 
been developed for a single year's data which may include partially identified 
sampled units. In order for these techniques to support aggregation activities 
some statement of the uncertainty of the estimate must be conveyed. This is 
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best handled by providing an estimate of the variance of the ground cover area 
estimator. Within a large homogeneous arcc (called a stratum) a sample of 
segments (currently 5 by 6 nautical mile rectangles) is observed. These 
observations are collected as satellite imagery and are available for a period 
of a few years and at several times during the crop growing seasons. Using 
these segment acreages segment proportions for several crops are estimated. 
Difficulties in distinguishing between crops leads to partially identified 
segments as opposed to completely identified segments. Herein* a segment will 
be considered to be planted in two major crops, crop 1 and crop 2, the remainder 
of the segment will be pooled under crop 3, "other". When crop 1 and crop 2 
are distinguishable the segment is completely identified. If it is not 
possible to distinguish them, the segment is partially identified. Both types 
of segments can be combined to estimate a crop's proportional acreage in the 
stratum. 

Methods of estimating individual crop acreage using a mixture of completely 
and partially identified segments have been discussed in Sielken (1981) and 
(1982). 

The assumption used in Sielken (1982) is that the number of acreage units 
harvested in a segment follows a multinomial distribution. An acreage unit 
will be hereafter referred to as a block. The number of blocks within a 
segment planted in crop i is denoted by Y.., and the total number of blocks in 
a segment is denoted by N. Under the multinomial assumption the Y^'s have the 
distributi on 

p (V 1 - y,, v 2 ■ y 2 ) = (N!/(y 1 !y 2 !y 3 U) p* 1 p 2 p* 3 , 

1 2 3 




V S* 
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when 

N = y 1 + y 2 + y 3 

and is the at-harvest proportion of the stratum planted in crop i. This 
assumption is correct if every decision maker acts independently and allocates 
each block independently to crop 1, crop 2, or "other" with probabilities 
p-j, p 2 , and p 3 - (1 - p-j - p 2 ) respectively. 

A random sample of J segments is to be observed. Let Y. . = number of 

' J 

blocks in segment j containing crop < i = 1 , 2, 3 and j = 1 , . . . , J. Assume 
the segments j = 1 , ...» J c are completely identified and segments j = + 1 , 

. Jq + dp are partially identified. Therefore, J = J c + J p . Let 


z ri 58 E Y ii’ 1 s 1. 2, 3, 
0=1 1J 


Zpl2 Vj c+ i (VlJ + V2j) ’ 


7 = v r Y 

P3 j-j c +i 


Thus, Z^ is the total number of blocks containing crop i in the completely 
identified segments. The total number of blocks containing either crop 1 or 
crop 2 in the partially identified segments is Z pi2 . The total number of blocks 
containing crop 3 in the partially identified segments is Z p3 .- The total 
number of blocks completely (partially) identified is N c (N p ). Thus, if N is 
the number of blocks in one segment, 

N c = J c N, 


I! 


K 


Np = Jp N. 
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As noted in Vidart and Sielken (1984) the results from Hocking and Oxspring 
(1971) can be used to show that the maximum likelihood estimators are 

Pi ” £Z cl /(Zci + ^C2^^C1 + ^C2 * ^P12^^C + ^p)]. 

P 2 " ^C2^ Z C1 + ^C2^^^C1 + ^C2 + ^P12^^C + ^P^‘ 
and 

P 3 = l-p-|-p 2 = 1-P-|-P 2 = (Z C3 + Z p 3 )/(N C + Np). 

The form of these estimates is fairly intuitive since 

A 

P 1 = [ Estimated proportion of crop 1 and 2 that is crop 1 in the completely 
identified segments ] x 

[ Estimated proportion of crop 1 and 2 in all the segments ]. 

The asymptotic variances (AV) of these estimates are 

AV(p 1 ) = P 1 (l-P 1 )/N c -[p^p 3 N p ]/CN c (N c + N p )(p 1 + p 2 )] # and 

AV(p 2 ) - P 2 ( 1 ~P 2 )/Nq“[p2P 3 Np]/[Nq (Nq + N p )(p^ + p 2 )]. 

The second term of these expressions shows the improvement obtained by using 
the partially identified segment. After some simplification, the asymptotic 
variances can b rewritten as 

AV(p-j ) = P 1 (l-P 1 )/(N C +N p ) + [N p p 1 P 2 ]/[N c (N c +Np)(p 1 +p 2 )], 

AV(p 2 ) = P 2 (l-P 2 )/(N C +N p ) + [N p p 1 P 2 ]/[N c (N c +Np)(p 1 +p 2 )], 

and for p 3 = l-p^-p 2 

AV(p 3 ) = p 3 (l-P 3 )/(N c +N p ). 
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A computer program has been implemented to test the accuracy of these 
asymptotic variances when N c and N p are not both arbitrarily large. Samples 
with the prescribed number of partially and completely identified segments 
were simulated following a multinomial distribution. For each sample the 
maximum likelihood (ML) estimates of the p's were computed. Finally the sample 
variance of these ML estimates were compared to the asymptotic variances. 

r* 

The details of the evaluation of the applicability of the asymptotic variance 
formulas to small sample sizes are given in Vidart and Sielken (1984). The 
conclusion was that the asymptotic variance can be used as a good approxima- 
tion of the actual variance under the multinomial decision process even for 
relatively small sample sizes. 

One Monte Carlo study of the empirical behavior of the crop acreage 
estimation procedure utilizing partially identified data was already 
available in Sielken (1982). The sample variances of the maximum likelihood 
estimates of the p - ' s in Sielken (1902) can be compared to the asymptotic 
variances under the multinomial assumption. In order to computer the 
asymptotic variances of the maximum likelihood estimators of the p's, the 
number of blocks N contained in a segment must be determined. This information 
is not available since the CAMS estimates are given in percentages rather 
than blocks. Therefore, N was estimated separately for 1*1, 2,- and 3 and 
for different combinations of J c and J p . For a particular crop the estimated 
value of N is nearly the same for the different combinations of J c and J p . 

However the value of N seems to vary with the crop. In other words, the 
theoretical variances under the multinomial decision process differed markedly 
from the observed sampled variances. Consequently, the multinomial decision 
process is not applicable. 


21 


Since the multinomial assumption does not hold, another decision process 
must be considered. The estimated N values suggest that some crops are 
planted in a larger "standard area" than others. The standard area of a 
"large" crop has more blocks than a "small" crop does. Crop 3 "other" 
appeared to be a "large" crop and crop 1 a "small" crop. This suggested the 
following approach. A block will now denote a particular fixed number of 
acres corresponding to the smallest decision possible. Let denote the 
theoretical number of blocks in a standard area of crop i. This conceptual- 
ization envisions crop i being planted only in integer multiples of blocks. 

If p. is equal to the overall proportion of the stratum planted with 
crop i, then this alternative decision process independently allocates each 
K 3 blocks of acreage according to the following sequential procedure: 

1. Allocate « 3 blocks to crop 3 with probability p 3 . 

2. If the K 3 blocks are not allocated to crop 3, then allocate those « 3 
blocks to crops 1 and 2 as follows: 

2-1. Allocate « 2 blocks to crop 2 with probability 

P' 2 = p 2 /(l-p 3 ). 

2-2. If these K 2 blocks are not allocated to crop 2 during step 2-1 , 
then allocate these « 2 blocks to crop 1. 

Obviously, it is assumed that = « 2 , N is an integer multiple of « 3 , and 
K 3 is an integer multiple of K^. For simplicity and K 2 are defined to 
be 1 block and « 3 is renamed K. The resulting decision process can be 
summarized as 

1. Allocate K blocks to crop 3 witn probability p 3 . 


2. If these K blocks are not allocated to crop 3 during step 1, allocate 
those K blocks to crop 1 and 2 using a binomial decision process with 
probabilities p^ and « 1 -pj where 

Pj = P 1 /O-P 3 ). 

3. Repeat steps 1 and 2 until ad N blocks are allocated. 

This alternative decision process will be referred to as the KDP. A group of 
K blocks will be called a superblock. The particular case where K ** 1 is the 
multinomia decision process, 1DP. 

The parameters in KDP include N, the number of blocks in a segment, and 
K, the number of blocks in a superblock, as well as p-j, p 3 > and p 3< In Vidart 
and Sielken (1984) it is shown that the maximum likelihood estimates for the 
p.j 's under the KDP are the same as under 1DP and do not depend upon N or K. 

a 

However, the asymptotic variances of the p.'s do depend upon N and K which are 
both unknown. In Vidart and Sielken (1984) estimators for N, K, and 

a 

approximations for the variances of the p^ 1 s are derived. Also a simulation 

a 

check on the approximate expressions for the variances of the p.'s is reported 

A 

there. The sample variances of the p's were very close to their approximating 
expressions. 

In Vidart and Sielken (1984) the KDP is also extended to the situation 
where the sampling units have variable sizes instead of the constant size 
typified by 5x6 nautical mile segments. Such a situation could easily occur 
if the sampling units were political subdivisions such as counties. 

One objective of the contract research was to determine the improvement 
brought about through the use of the KDP instead of the 1 DP when CAMS estimates 
are studied. Some improvement in the prediction of the variance of the crop 
acreage estimators is achieved by considering the new decision process. For. 


fairly large samples, typically 50 segments, the use of the KDP as opposed 
to the 1DP leads to an Improvement In the prediction of the variances of the 
ML estimators of the croD proportions based on CAMS data. For smaller, more 
realistic size samples, typically 5 segments, the variance estimation 
techniques were not very accurate. However, the empirical results indicate 
a better performance under the KDP, than under the 1DP, The variance 
estimates under KDP have a distribution with more spread but centered much 
closer to the sample variance than the corresponding distribution under 1DP. 

The greatest overall improvement is associated with the estimated variance for the 
smallest crop (i.e r , the crop planted in the smallest size blocks) while the 
other estimated variances improve just slightly overall. 

7. ADDITIONAL RESEARCH RESULTS 

A special issue of Communications in Statistics concerning statistical 
applications at NASA is being prepared under the coordination of Dr. Raj 
Chhikara, Lockheed Engineering and Management Services Company, Inc. R. L. 
Sielken, Jr. and E. E. Gbur have prepared a contribution entitled "Multiyear, 
Through the Season Crop Acreage Estimation Using Estimated Acreage in Sample 
Segments" for that special issue. That contribution has been refereed and 
accepted. A copy of that paper is attached to this final report. 

Some additional research has been done on the empirical behavior of the 

A 

transformations y(p) used in conjunction with model (1) and (2). The simplest 
transformation y ( p ) of the estimated segment crop acreage proportion p to use 
in (1) or (2) is the identity transformation 
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VI 


However, it is very doubtful that the additive model (1) would hold for y(p) ■ p 
particularly if the p's exhibit a large variation within the stratum. On the 

A 

other hand a multiplicative model for p may be more reasonable. For instance, if 

(i) 30% of the stratum contains wheat at the time wheat is harvested in 
year t; 

(ii) the s-th segment's wheat acreage proportion averages only 80% of 
the stratum's wheat acreage proportion at harvest time; 

(iii) the at-harvest acreage estimate made at mid-season is only 70% 
of the at-harvest estimate made at harvest time; 
and 

(iv) the sampling and classification errors cause the estimated at-harvest 
acreage to be 110% of what it would be without these errors, 

then 

p ts£ = (.30) (.80) (.70) (1.10). 

Here a logarithmic transformation, y(p) = £n(p), would be appropriate and 

y<Pts£> * °t + b s + 6 t + e ts£ 

= £n(.30) + £n(.80) + £n( .70) + £n(1.10). 

The logit transformation, 

y(p) = (1/2) ^n[p/(l-p)] , 

is another useful transformation which approximately converts a multiplicative 
model for p into an additive model for y(p). A small advantage of the logit 
transformation is that it guarantees that 

0 « P T = y‘ 1 (S T ) « 1 , 

whereas the logarithmic transformation only guarantees 

* 

/T\ 
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P T = y' 1 ^) > 0 , 

and the identity transformation makes no guarantees. 

All three of the above transformations are considered in Dahm and Sielken 
(1981) wh„re approximate expressions are derived for 

(i ) the bias of y"^ (ay) , 

1 * 

(ii) the mean squared error of y (ay) , and 
(iii) confidence intervals on Py . 

These derivations are all similar and are based upon Taylor series approximations 
(statistical differentials). For instance, if y ( p) = £n( p ) , then 

Py = y -1 (a y) = y” ] (ay) + (ay - ay) 

= Py + («y - a y )Py , 

so that 

MSE (P T ) f E [(P T - P T ) 2 ] s P 2 Var(° T ) . 

A small simulation study was conducted in order to observe the empirical 
behavior of the estimators of the components of models (1) and (2) (namely, 
a£, of* y = S^/a|) and the estimators of the stratum's crop acreage proportions 
over the years t = 1, ...» T (namely, y'^aj), ..., y _ 1 (ay)). In this simulation 
study each of the three transformations (identity, log, and logit) were used 
to generate a random data set corresponding to each of four underlying situations. 
Each of the twelve data sets was analyzed three times: ortce using the identity 

transformation, once using the log transformation, and once using the logit 
transformation. Thus each data set was analyzed once using the "correct** 
transformation and twice using an "incorrect" transformation. Since the "correct" 


dy" ] (« T ) 


da- 


“T * “t 



transformation is unknown in practice, the simulation study provided a limited 
evaluation of the sensitivity of the estimators to the "correctness" of the 
transformation being used in the statistical analysis. All underlying simulated 
situations involved 

i) 3 years with the stratum crop acreage proportions being 0.6, 0.6, and 
0.4 for years 1, 2, and 3 respectively; 

ii) 3 seasons with the seasonal biases being 6-j = -0.3, -0.1, and 6 3 = 0 
respectively; 

iii) 10 segments observed in each season in each year; and 

iv) no partial identification. 

The variance among segments a£, variance within segments o|, and y = a£/a| 
took on different values in each data set; the four combinations were (a£ = 0.0004, 
o\ = 0.001, y = 0.4), (o£ = 0.0004, o 2 £ = 0.0001 , Y = 4), (a* = 0.004, a| = 0.001, 
y = 4), and (o£ = 0.004, = 0.0001, y = 40). The estimators of ojj, c§, and y 

are shown in Tables 1-4 for each of the four data sets. Also in these tables 
are the estimators and approximate 90% confidence intervals for the stratum's 
crop acreage proportions P^, P 3 , and P 3 for the three years. 

In the simulation study the estimators of a^, and y were not precise. 
However, these estimators are usually of only secondary importance. The primary 
conclusion from the simulation studies was that the estimators and 90% confidence 
intervals for the stratum's crop acreage proportion which are Of primary 
importance behaved quite well and were relatively robust with respect to the 
transformation used. 



Table 1. Situation Ho. 1 in the Simulation Study of the Multiyear Model Estimators 

and their Robustness to the Transformations Involved 
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Table 2. Situation No. 2 in the Simulation Study of the Multiyear Model Estimators 

and their Robustness to the Transformations Involved 
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Situation No. 3 in the Simulation Study of the Multiyear Model Estimators 
and their Robustness to the Transformations Involved 
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8. Concluding Remarks 


The primary purpose of this research effort has been to identify and develop 
statistical procedures for large area assessments using both satellite and 
conventional data. Crop acreages, other ground cover indices, and measures of 
change have been the principal characteristics of interest. The characteristics 
are capable of being estimated from samples collected possibly from several 
sources (different satellites, aerial surveys, ground measurements, etc.) at 
varying times (different years, seasons, crop calendar days, etc.) witn different 
levels of identification (for example, vegetation, crops, summer crops, corn)> 

The overall objective has been to be able to obtain the most precise large area 
estimates from multiyear samples including possibly partially identified sample 
units. Included in this research have been 

a) extensions of multiyear analysis techniques to include partially 
identified samples, and 

b) the determination of the best current year sampling design corresponding 
to a given sampling history, 

c) determination and utilization of observation weights reflecting the 
precision or confidence in each observation, and 

d) quantification of the variation in estimates incorporating partially 
identified samples. 

The development and utilization of observation weights reflecting the observation' 
precision may be a very fruitful area for additional research. 
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ABSTRACT 

Large scale crop surveys can be, made frequently and inex- 
pensively during a crop growing season using Landsat data. A 
crop’s estimated at-harvest acreage in a stratum can be esti- 
mated from the crop's estimated at-harvest acreage in a small 
sample of the stratum's segments. The stratum estimate can 
utilize Landsat imagery obtained during the current crop grow- 
ing season and in previous years. A mixed effects analysis of 
variance model is used to generate a weighted least squares es- 
timate of the stratum at-harvest acreage proportion for the cur 
rent year. Similar Landsat based stratum crop proportion esti- 
mates can be combined with historical information on non- 
sampled (or unsuccessfully sampled) strata to provide crop 
acreage estimates for large regions. These regional estimates 
of the at-harvest acreage can be determined early in the crop 
growing season, at different intermediate points, and at har- 
vest time. 


1. INTRODUCTION 


Agriculture and other renewable resources can be economically 
Inventoried over large areas using aerospace remote sensing tech- 
niques. In particular, the surface area devoted to a specific 
resource in a large region is especially amenable to aerospace 
estimation. Such resources could be as broadly defined as agri- 
culture, forest, water, snow cover, etc. or as specifically defined 
as summer crops , or com. These area estimates can be combined with 
other measures such as estimated yield per acre to obtain produc- 
tion estimates. Once the appropriate estimation methodology has 
been successfully implemented, the successive estimates are very 
economical, so that frequent inventories are realistically 
obtainable. 

During 1975-1977 NASA in conjunction with the USDA conducted 
the Large Area Crop Inventory Experiment (LACIE) to illustrate the 
potential capabilities of aerospace remote sensing techniques. 

This pioneering effort also served to remove many of the obstacles 
for future applications. A summary of the experiment is given in 
the proceedings of the LACIE Symposium (1979). The target re- 
source in LACIE was the wheat acreage and production in the U. S. 
Great Plains. 

During the transition years 1977-1979 and during 1979-1983 
under the recently- terminated AgRISTARS (Agriculture and Resources 
Inventory Surveys through Aerospace Remote Sensing) program 
several advances were made in satellite imagery technology, data 
processing, and statistical methodologies. In addition, target 
resources were expanded to include other crops and other countries, 
as well as non-crop resources . 

This paper focuses on the statistical methodology for estimat- 
ing a particular resource’s acreage proportion in a large region 
at a specified point in time using the estimated resource acreage 
proportion in a sample of smaller areas. It will be assumed that 
(i) the resource is a crop, 

(ii) the specified time point of interest is the harvest 


time for the crop, 

(lii) the sample areas are all the same size (a 5x6 
nautical mile rectangle called a segment) , and 
(lv) the sample segments are relatively "small’' compared 
to the homogeneous region (stratum) of interest. 
Also, it is assumed that in each year of a multiyear period a 
sample of segments is selected. The composition of the sample 
may vary year to year. In each year each sample segment’s at- 
harvest crop acreage proportion is estimated at one or more 
times during the crop grcwing season. The number of estimates 
is not necessarily the same for all sample segments in a year 
and is not necessarily the same for each year. Obviously, this 
paper is focusing on only one part of a much .larger problem. 

The region of concern herein is really just one stratum in a 
stratified sample survey of a country or the world (see, for 
example, Chhikara and Feiveson (1982)). The size of the sam- 
ple segment is assumed to be predetermined (see Chhikara and 
Feiveson (1982) and in this issue Chhikara et al. (1984)). 

Also, since the same segments do not have to be in the sample 
every year, there is an interesting associated problem of 
determining an optimal multi-year sampling design (see Chhikara 
et al. (1984) and the technical reports listed in the biblio- 
graphy). The papers by Heydorn (1984) and Hall and Houston 
(1984) in this issue discuss the determination of the sample 
segment's estimated at-harvest crop acreage proportion. 

Finally, the estimates arising from the statistical method- 
ology in this paper can be input to procedures for aggregating 
acreage over several regions and combining acreage estimates 
with yield estimates to obtain production estimates. The 
paper by Feiveson (1984) in this issue addresses these latter 
needs . 

H. 0. Hartley during his years (1963-1979) as Distin- 
guished Professor of Statistics at the Institute of Statis- 
tics, Texas A&M University, contributed greatly to NASA's 


research efforts pertaining to crop acreage estimation, and 
his ideas have frequently stimulated his co-workers' efforts. 
The seeds for many of the sampling and modeling techniques 
utilized in several of the papers in this issue were sown by 
him. 

2. BASIC MODEL FOR MULTIYEAR ESTIMATION 

Each stratum at-harvest crop acreage proportion could 
be modeled using a regression approach with explanatory vari- 
ables such as the past, present, and anticipated economic and 
meteorological conditions. However, the unknown form of the 
regression model, the large number of possible explanatory 
variables, and the difficulty in obtaining reasonable values 
for these variables makes this approach unattractive. Never- 
theless, the combined effect of all of these variables is 
reflected in the crop acreage proportions for the stratum 
segments. Although it is not economical to estimate the at- 
harvest crop acreage proportion for every segment in the 
stratum, it is feasible to estimate them for a sample of seg- 
ments using Landsat data (aee, for example Hall and Houston 
(1984) and Heydorn (1984), both in this issue). Hence, an 
alternative app-oach is to model the estimated at-harvest crop 
acreage proportion for a sample segment in terms of 

(i) the stratum at-harvest crop acreage proportion, 

(ii) stratum-wide influences which vary from year to 
year, 

(iii) characteristics of the segment itself, 

(iv) yearly influences which affect different segments 
differently, and 

(v) the proportion of the growing season which has 
passed at the time the estimate is determined. 

These factors may only contribute roughly additively to a trans- 
formation of the segment at-harvest crop acreage proportion and 
may not contribute additively to the segment proportion itself. 

One specific model which is compatible with these ideas is 


y< W 


a t + b 6 + S £ + e ts£ 


t *• 1, • • • » T, 
s ■ X , » 1 1 f S ) 
** 1, » i * ) X# 


(1) 


where 


^tsi 




“ the estimated proportion of the s-th segment's 
acreage that will contain the crop at harvest 
time in the t-th year when the estimate is made 
at crop calendar time £ (for example, £ * 1 could 
denote early season, £ ** 2 mid-season, and £ * 3 
harvest time) ; 

■ a transformation of P ts £j 

* the stratum's transformed crop acreage proportion 
for the t-th year; 

* the s-th sampled segment's departure from the 

stratum’s transformed crop acreage proportion; 

the b 's are Independent random variables* each 

with mean zero and variance o'*. 

D 

* the systematic difference between the estimates of 
the crop's transformed at-harvest acreage propor- 
tion made at the £-th crop calendar time and the 
corresponding estimate made at harvest time; 

<«L E ° )! 

*= the aggregate of sampling and classification errors 
in the transformed data; the e ^'s are independent 
random variables each with mean zero. 

This model is, of course, not the most general model possible. 
In particular, the segment effects b are assumed to be inde- 
pendent of the crop calendar time and the year. Also the 
departures of the transformed observations y(P ts £) ° n the same 
segment from their fixed year effects a v and their fi^ed 
estimation time effects 6^ are assumed to be positively corre- 
lated. The error terms e fcg £ are the composite effect of many 
components and need not have homogeneous variances ; in parti- 
cular see Heydom (1984) for a detailed discussion of the 


tal 


classification error components. 

The primary objective is to estimate the crop's at- 
harvest proportion of the stratum acreage in the current 
year, T; that is, estimate = y (o^). Secondary objec- 
tives could be improved estimates of at-harveat acreages in 
previous years or estimates of changes in the stratum at- 
harvest crop acreage proportion from year to year. 

Estimates of the stratum at-harvest crop acreage propor- 
tion are also often desired throughout the current year as 
well as at harvest time. For example, an early season esti- 
mate of P^ based on observations for Jt ** 1, ..., L for t * 1, 
..., T-l and only 1 ■ 1 for t «= T is frequently desired. 

Even though the estimate P T = y (a T ) of the stratum at- 
harvest crop acreage proportion for the current year involves 

A 

only a^, this estimate depends on the entire multiyear data 
set and not just the data from year T since the segment effects 
(b s 's) and systematic estimation time biases (6^'s) are assumed 
to be constant from year to year. 

Special cases of model (1) have also been considered. For 
example, Chhlkara et al. (1984) consider at-harvest estimates 
made only at harvest time, so that their model is 

fs 

p * u "P b e , t — 1, . . . , T and s — 1, • . . , S. 
ts t s ts 

For simplicity Feiveson (1984) considers only estimates of the 
stratum at-harvest crop acreage proportion made at harvest time 
during the current year; i.e., 

p Ts - “T + e Ts ’ S = 1 S> 

When such data is not available , Feiveson (1984) utilizes his- 
torical data from agricultural reports even though previous 
Landsat data could also be incorporated. The methodology in 
both of these papers can be extended to incorporate the more 
general model (1) . 


3. TRANSFORMATIONS OF THE ESTIMATED SEGMENT PROPORTIONS 

The simplest transformation y(p) of the estimated segment crop 
acreage proportion p to use in (1) is the identity transformation 

y(p> “ P- 

However, it is very doubtful that the additive model (1) would hold 
for y(p) * p particularly if the p’s exhibit a large variation within 
the stratum. On the other hand a multiplicative model for p may be 
more reasonable. For instance, if 

(i) 30% of the stratum contains wheat at the time wheat is 

harvested in year t ; 

(ii) the s-th segment's wheat acreage proportion averages only 
80% of the stratum’s wheat acreage proportion at harvest 
time; 

(iii) the at-harvest acreage estimate made at mid-season is 

only 70% of the at-harvest estimate made at harvest time; 
and 

(iv) the sampling and classification errors cause the estimat- 
ed at-harvest acreage to be 110% of what it would be 
without these errors, 

then 

P ts i = (-30) (.80) (.70) (1.10). 

Here a logarithmic transformation, y(p) = £n(p), would be appropriate 
and 

y( Pte£ ) " °t + b s + S i + e t si 
= £n( . 30) + &n(. 80) +£n(.70) +£n(1.10). 

The logit transformation, 

y(p) - (1/2) £ntp/ (l-p)D , 

is another useful transformation which approximately converts a mul- 
tiplicative model for p into an additive model for y(p). A small 
advantage of the logit transformation is that it guarantees that 

0 < P T * y" 1 (a T ) < 1 , 

whereas the logarithmic transformation only guarantees 

Pj ■ y _1 (o T ) > o , 


n 


and the identity transformation makes no guarantees. 

All three of the above transformations are considered in Dahm 
and Sielken (1981) where approximate expressions are derived for 

(i) the bias of y 1 (a T ) , 

(ii) the mean squared error of y 1 (a^) , and 

(iii) confidence intervals on P T , 

These derivations are all similar and are based upon Taylor series 
approximations (statistical differentials). For instance if y(p) = 
tn (p) , then 


A 



y~ 1 (aj 


-1 

y 


(a T ) + (a T -a T ) 


P T + ^ a T ~ a T^ P T , 


dy _1 (a T ) 



a„ 


so that 

MSE (P T ) = E {(P T - P T ) 2 ] = p2 Var(a T ) . 


4. THE WEIGHTED LEAST SQUARES ANALYSIS OF THE SEGMENT ESTIMATES 

A 

The probable heteroscedasticity of the y(P ts £)’ s suggests that 
the mixed effects model P) should be analyzed in the form 

A 

M ts £ y(p t e d ■ w ts£“t + u ts£ b s + w ts t 6 1 + Hsl (2) 

A 

where w £ is proportional to {Var Cy(p tg £)3} 2 • 

In matrix notation (2) can be written as 

Wy = WX(“) + WU b + le (3) 

where 

y = (ym. yiu» •••* y TSL )' * 

a = (a lt ...» a T )" , 

6 = ( 6 1 , fi 2 » •••» <S L _ 1 )'' * (since <$ L = 0) , 

b = (b i , b 2 » • • • » bg) * 

W = matrix containing the w ts £’ s * 

= design matrix of 0’s and l’s corresponding to the 
fixed effects :a t and 6^ , 


X 


U * .sampling design matrix of 0's and l's corresponding 
to the sampling pattern for the distinct segments, 
and 

I *= identity matrix. 

In (3) the random portion of Wy is WUb + le which has covariance 

Va 2 = la 2 + WUirWa 2 
e e b 

- (I + WUU"W"Y)a 2 , 


where a 2 = Var(e tg £) and y = oj^/cr 2 
squares estimator of (a, 5)" is 


Hence, the usual weighted least 


(“) - (X'W''V _1 WX)“ 1 X'W"V'" i Wy 


~ ^ v-t 


-1 


( 4 ) 


and 


Var[(“)] = (X'W'V”^)"^ 2 


-1 


In particular 


Var (a_) = (X'W'V"^)^ 1 


T,T 


(5) 


-1 


where ( ) 1 p ^ denotes the T-ch element on the diagonal of the matrix 
inverse. 

Although the formulas in (4) and (5) are fairly standard,, there 
are several obstacles to be overcome before they can be applied. The 
detailed procedures for overcoming them are given in Dahm and Sielken 
(1981). Only the nature of obstacles and the basic approach to over- 
coming them are discussed here. 

An initial obstacle is that the y vector is not computable if 
any y(P ts £) corresponds to either the logarithmic transformation with 
P ts £ *= 0 or the logit transformation with P tg £ = 0 or 1. Although 
P tg £ = 1 would be highly unexpected, P tg £ = 0 is quite common. This 
obstacle can be overcome through the use of "working y's" as in 
Finney (1964); that is, by 

(i) estimating the parameters in (3) using only the data for 
which y is calculable; 

(ii) substituting the estimated parameters from (i) into (1) 
along with e tg £ = 0 to obtain approximate y ts £’ s > sa y 

y ts v and a pp rox±mate p ts £ ,s > sa y p* ts £ = 

and finally 


(ill) creating working values for y(P ts £) using a first order 

ts£ 


P* 


.-1 


Taylor series expansion of y(p) about p 

These working y's can then be used in (4). 

A second obstacle to using both (A) and (5) is that V ‘ contains 
the unknown variance component ratio y = a 2 /cr 2 E • If y is replaced 
by an independent consistent estimator y , then (5) is asymptotically 

A 

correct. When such a Y is unavailable, a reasonable alternative is 

to treat (3) as if it were a fixed effects model and obtain estimates 

of o 2 and a 2 (and h 4 s, -ie their ratio y) by equating certain sums of 
b e 

squares from the fixed effects model analysis with their expectations 
under the mixed model. This is basically Henderson's Method 3 (see, 
for example, Searle (1971)). 

A 

Finally, the weight matrix W is unknown since Var Cy(p ta £)] is 
unknown. A first order Taylor series approximation can be used to 
relate Var Cy(p tg £)3 t0 ^ ar ^Pts£^' For exam P^ e » ^ y(p) = ^n(p) 
and p is distributed with mean p and variance , then 


y (p) = £n(p) + (p-p) 

*= ^n(p) + (p-p)/p , 


jy M . 


dp 


■ A 

p=p 


so that 


E Cy (p) 3 s £n(p) 


and 


Replacing p by p 

One approach 


Var [y (p) ] s E [ (p - p) 2 /p 2 ] 

2 , 2 
a£/P • 

In this manner the form of W can be identified. 

2 

would yield an estimate of W if could be estimated. 

2 p 
to estimating cu is to assume that Np is binomially distributed for 
P 

some unknown value of N which is constant for all segments. Then 

2 A 
cu is proportional to p ( 1 - p) and in the above example Var [y (p) ] 

is proportional to (1 - p)/p which can be estimated by (1 - p)/p. 

A slight improvement can sometimes be obtained by iterating on the 

estimates of W and the p's. An alternative method of obtaining an 




estimate of W is currently under investigation. Here Var (P ts ^) 
is approximated primarily on the basis of information such as 

(i) the type of satellite being used, 

(ii) the sharpness of the satellite imagery, 

(iii) the season during which the estimate is being made, 

(iv) the number of satellite images successfully obtained 
by the time the segment proportion is estimated, 

(v) the nearness of the segment’s observed behavior to '>• 

classical crop profiles, 

(vi) the weather conditions during the crop’s growing sea- 
son , and 

(vii) the physical characteristics of the segment. 

This alternative approach may be particularly appropriate for mul- 
tiyear data sets where the remote sensing technology and segment 
proportion estimation methodology is changing from year to year. 

In addition, recognizable segment characteristics which make it 
either easier or harder to estimate the segment crop porportion 
can be incorporated. Obvious differences in the amount of infor- 
mation going into the P^ s ^’ s can also be reflected. These latter | 

differences can be due to the estimation times themselves as well 
as due to loss of satellite imagery from cloud cover, machine 
failure, etc. 

5 . AN EXAMPLE 

The technical reports cited in the bibliography as well as 

the paper by Chhikara et al. (1984) in this issue indicate the 

theoretical advantages of basing estimators on the full multiyear 

data set as opposed to only the data from a single year. Even when 

there are only 2 or 3 years’ data available, the accuracy of the 

current year’s at-harvest crop proportion estimate can often be 

improved by as much as 50% by utilizing the multiyear estimation 

procedures. Of course, the improvement depends on the multiyear 

2 2 

sampling design and the underlying value of y * G^/o^. 

Some of the potential benefits of the multiyear estimation 
procedure in a real-world setting are seen in the following 


example, The Landsat based esti ^ces of the at-harvest wheat acre- 
ages computed at harvest times during each of 1976 * 1977, and 1978 
for 108 sample segments in the Great Plains states were available 
to the authors. Although these sample segment estimates were 
determined for other purposes, they can also be used to evaluate 
proposeu statistical procedures. In an experiment the following 
procedure was repeated 200 times: 

(a) Randomly select (without replacement) 40 segments from 
the 108 available. 

(b) Treat this sample of 40 segments with their 3 years 
of estimated at-harvest wheat acreages as the simu- 
lated "stratum" whose at-harvest wheat acreage pro- 
portion is to be estimated. Determine the true ac~ 
harvest wheat acreage proportion for 1978 for this 
"stratum". This proportion is the estimation target 
for this repetition. 

(c) Assume the following multiyear rotation sampling de- 
sign. In 1976 a random sample of 5 segments from 
the stratum of 40 segments is observed. In 1977 
three of these five are observed again along with 
two new randomly selected segments. Finally in 
1978 one of the three segments observed in both 1976 
and 1977 is observed a third time, the two new seg- 
ments in 1977 are observed a second time in 1978, 
and finally two totally new randomly selected seg- 
ments are observed. Schematically the sampling de- 
sign of 5 segments per year is as follows: 

Segment Number 1976 1977 1978 


1 

2 

3 

4 

5 

6 

7 

8 
9 


X 


X 


X 

X 

X 

X 

X 

X 


X 


x 


x 

X 

X 

X 

X 


(d) The multiyear estimation procedure described in sec- 
tion 4 is carried out using y(p) « ln(p). The multi- 

-1 A 

year estimate, y (a,j) , of the stratum’s at-harvest 
wheat acreage proportion in 1978 is computed. The 
corresponding single-year estimate is also computed 
using only the 1978 sample data. 

(e) The corresponding estimation errors are the differ- 
ences between the simulated stratum’s 1978 at-harvest 
wheat acreage proportion and the multiyear and single- 
year estimates. 

The average absolute value of the errors was 0.046 for the multi- 
year estimator and 0.072 for the single-year estimator. Thus, the 
average absolute error for the single-year estimator was approxi- 
mately 1.6 (0.072/0.046 =1.57) times as great as the average ab- 
solute error for the multiyear estimator. All of the other mea- 
sures of empirical behavior considered also favored the multiyear 
estimator. The average squared errors for the multiyear and singl 
year estimators were 0.0033 and 0.0073, respectively. The average 
biases relative to the average 1978 at-harvest wheat acreage pro- 
portion for the entire 108 segments were 0.002 and -0.047. The 
sample standard deviations of the multiyear and single-year pro- 
cedures were 0.061 and 0.076, respectively. Thus, the multiyear 
estimation procedure provided a substantial percentage improvement 
over the single-year estimator. 
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